[PECOBLR-201] add variant support #560

shivam2680 · 2025-05-19T10:30:11Z

Description

This pull request introduces support for detecting and handling VARIANT column types in the Databricks SQL Thrift backend, along with corresponding tests for validation.
updated the _col_to_description and _hive_schema_to_description methods to process metadata for VARIANT types
Added unit and end-to-end tests to ensure proper functionality.

Testing details

End-to-End Tests:

Added tests/e2e/test_variant_types.py to validate VARIANT type detection and data retrieval. Includes tests for creating tables with VARIANT columns, inserting records, and verifying correct type handling and JSON parsing.

Unit Tests:

Tests cover scenarios like VARIANT type detection, handling of null or malformed metadata, and fallback behavior for missing Arrow schemas.

github-actions · 2025-05-19T10:30:24Z

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

github-actions · 2025-05-19T10:47:16Z

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

jprakash-db · 2025-06-17T06:29:58Z

src/databricks/sql/thrift_backend.py

@@ -692,12 +692,36 @@ def _col_to_description(col):
        else:
            precision, scale = None, None

+        # Extract variant type from field if available


Are you sure this is correct? I tried and was getting metadata as null when the column type is variant. Also for variant the pyarrow schema just shows string in my testing, shouldn't the server return variant type ?

yes,
debug output:
[SHIVAM] field pyarrow.Field<CAST(1 AS VARIANT): string>
[SHIVAM] field metadata {b'Spark:DataType:SqlName': b'VARIANT', b'Spark:DataType:JsonType': b'"variant"'}

@shivam2680 I am getting this as the arrow_schema, where metadata is null. Is this some transient behaviour ? or am I missing something

tests/e2e/test_variant_types.py

github-actions · 2025-06-17T08:38:55Z

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

github-actions · 2025-06-18T06:06:27Z

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

jayantsing-db · 2025-06-18T07:03:24Z

src/databricks/sql/thrift_backend.py

+        if field is not None:
+            try:
+                # Check for variant type in metadata
+                if field.metadata and b"Spark:DataType:SqlName" in field.metadata:
+                    sql_type = field.metadata.get(b"Spark:DataType:SqlName")
+                    if sql_type == b"VARIANT":
+                        cleaned_type = "variant"
+            except Exception as e:
+                logger.debug(f"Could not extract variant type from field: {e}")
+


please check with eng-sqlgateway if there is a way to get this from thrift metadata. python connector uses thrift metadata for getting metadata

is there is some documentation/contract around it or is it purely from empirical evidence?

these schema bytes are read from t_result_set_metadata_resp.arrowSchema itself. Please refer to https://sourcegraph.prod.databricks-corp.com/databricks/databricks-sql-python/-/blob/src/databricks/sql/backend/thrift_backend.py?L812

github-actions · 2025-08-20T18:56:17Z

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

jprakash-db

can you plz incorporate @pytest.mark.parametrize , I feel a lot of test code duplication can be avoided

tests/e2e/test_variant_types.py

jprakash-db · 2025-08-20T19:29:27Z

tests/unit/test_thrift_backend.py

@@ -2356,6 +2356,149 @@ def test_execute_command_sets_complex_type_fields_correctly(
                t_execute_statement_req.useArrowNativeTypes.intervalTypesAsArrow
            )

+    def test_col_to_description_with_variant_type(self):


There is too much code duplication, like multiple tests just need different arguments. Can you use pytest fuxtures with arguments

tests/unit/test_thrift_backend.py

github-actions · 2025-08-20T20:16:06Z

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

shivam2680 · 2025-08-20T20:21:58Z

The test_thrift_backend class inherits from unittest.TestCase, which is designed for unittest's test discovery and execution model. When you use @pytest.mark.parametrize with a unittest.TestCase class:

Pytest creates separate test instances for each parameter set
Unittest's test runner doesn't understand these parametrized arguments
Result: TypeError: missing required positional arguments

add variant support

b834ce7

shivam2680 requested review from deeksha-db, samikshya-db, jprakash-db, yunbodeng-db, jackyhu-db and benc-db as code owners May 19, 2025 10:30

shivam2680 changed the title ~~add variant support~~ [PECOBLR-201]. add variant support May 19, 2025

shivam2680 changed the title ~~[PECOBLR-201]. add variant support~~ [PECOBLR-201] add variant support May 19, 2025

add extensive tests for data types

3e8cce3

shivam2680 had a problem deploying to azure-prod May 19, 2025 10:47 — with GitHub Actions Failure

samikshya-db mentioned this pull request Jun 13, 2025

Databricks VARIANT DataType Support #424

Closed

jprakash-db reviewed Jun 17, 2025

View reviewed changes

addressed comments

d0e39ec

shivam2680 requested review from madhav-db, gopalldb, jayantsing-db and vikrantpuppala as code owners June 17, 2025 08:38

shivam2680 had a problem deploying to azure-prod June 17, 2025 08:38 — with GitHub Actions Failure

shivam2680 requested a review from jprakash-db June 17, 2025 15:03

Merge branch 'main' into variant

5354788

shivam2680 had a problem deploying to azure-prod June 18, 2025 06:06 — with GitHub Actions Failure

jayantsing-db requested changes Jun 18, 2025

View reviewed changes

vikrantpuppala removed request for vikrantpuppala and benc-db July 8, 2025 03:36

vikrantpuppala removed request for yunbodeng-db, jackyhu-db, samikshya-db and deeksha-db July 8, 2025 03:36

shivam2680 requested a review from jayantsing-db August 20, 2025 18:30

shivam2680 added 2 commits August 21, 2025 00:08

resolve merge conflicts

177c197

variant type detection

0d122e8

shivam2680 temporarily deployed to azure-prod August 20, 2025 18:56 — with GitHub Actions Inactive

jprakash-db reviewed Aug 20, 2025

View reviewed changes

address comments

6880834

shivam2680 deployed to azure-prod August 20, 2025 20:16 — with GitHub Actions Active

shivam2680 temporarily deployed to azure-prod August 20, 2025 20:16 — with GitHub Actions Inactive

shivam2680 requested a review from jprakash-db August 20, 2025 20:58

[PECOBLR-201] add variant support #560

Are you sure you want to change the base?

[PECOBLR-201] add variant support #560

Conversation

shivam2680 commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing details

End-to-End Tests:

Unit Tests:

Uh oh!

github-actions bot commented May 19, 2025

Uh oh!

github-actions bot commented May 19, 2025

Uh oh!

jprakash-db Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

shivam2680 Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

jprakash-db Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jun 17, 2025

Uh oh!

github-actions bot commented Jun 18, 2025

Uh oh!

jayantsing-db Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

jayantsing-db Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

shivam2680 Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 20, 2025

Uh oh!

jprakash-db left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jprakash-db Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Aug 20, 2025

Uh oh!

shivam2680 commented Aug 20, 2025

Uh oh!

Uh oh!

shivam2680 commented May 19, 2025 •

edited

Loading

jprakash-db Jun 17, 2025 •

edited

Loading